perf: reuse searcher across batch queries in find_evidence.py by tarilabs · Pull Request #145 · redhat-documentation/redhat-docs-agent-tools

tarilabs · 2026-05-04T13:10:52Z

In batch mode, ensure_index() was called per query via retrieve_evidence(), redundantly reloading the SentenceTransformer model, reopening the Milvus DB, and rebuilding the entire BM25 index on every iteration (~2-6s overhead each).

Call ensure_index() once before the loop and reuse the returned searcher for all queries within each batch invocation. This saves ~18-84s in the scope-req-audit step (~10-15 queries) and ~18-174s in the code-evidence step (~10-30 queries with two-pass retrieval).

Single-query mode is unchanged — it still delegates to retrieve_evidence().

Summary by CodeRabbit

Improvements
- Batch retrieval is faster and more efficient for multi-query runs.
- Path-based filtering now correctly resolves relative paths to the repository root.
- Search result output shows cleaner, rounded relevance scores for clearer readability.

In batch mode, ensure_index() was called per query via retrieve_evidence(), redundantly reloading the SentenceTransformer model, reopening the Milvus DB, and rebuilding the entire BM25 index on every iteration (~2-6s overhead each). Call ensure_index() once before the loop and reuse the returned searcher for all queries within each batch invocation. This saves ~18-84s in the scope-req-audit step (~10-15 queries) and ~18-174s in the code-evidence step (~10-30 queries with two-pass retrieval). Single-query mode is unchanged — it still delegates to retrieve_evidence(). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: tarilabs <matteo.mortari@gmail.com>

coderabbitai · 2026-05-04T13:11:07Z

Walkthrough

Batch evidence retrieval was refactored to build and reuse a shared search index via ensure_index, resolve filter_paths relative to the repo root, and format raw search matches with a new _format_result. Single-query mode still uses retrieve_evidence via _run_single.

Changes

Batch Retrieval Refactor

Layer / File(s)	Summary
Imports `plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py` (line 26)	Adds `Path` import from `pathlib` for repository-root path resolution.
Utilities / Data Shape `plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py` (lines 36–75)	Adds `_resolve_filter_paths(repo_path, filter_paths)` to resolve/normalize filter paths and `_format_result(query, filter_paths, repo_path, index_info, results)` to convert raw searcher matches into the evidence output structure with per-match fields and rounded `vector`, `bm25`, and `combined` scores.
Module Imports / Indexing `plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py` (lines 133–144)	Imports `ensure_index` (in addition to keeping `retrieve_evidence`) to enable creating/reusing a shared index for batch runs; updates related comments.
Batch Mode Core Change `plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py` (lines 159–174)	Batch flow now calls `ensure_index(args.repo, reindex=args.reindex)` once to obtain `searcher` and `index_info`, resolves each entry’s `filter_paths` with `_resolve_filter_paths`, runs `searcher.search(...)` for each query, and formats results with `_format_result`; removes prior per-query reindex-on-first-query logic.
Output Shape / Wiring `plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py` (lines 172–174)	Maintains the outer results list objects `{ "query": ..., "filter_paths": ..., "result": ... }`, but `result` now contains the formatted payload from `_format_result(...)` instead of the previous per-query `retrieve_evidence` output.
Single-Query Mode `plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py` (lines 144–158)	Retains `_run_single(...)` behavior that calls `retrieve_evidence` for single-query invocations; comments/parse logic updated to align with the new batch interface.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 10

✅ Passed checks (10 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'perf: reuse searcher across batch queries in find_evidence.py' accurately and specifically describes the main performance optimization in the PR—reusing a shared search index across batch queries instead of rebuilding it per query.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
No Real People Names In Style References	✅ Passed	No instances of real people's names used as style references in code-evidence skill files or documentation were found.
Git Safety Rules	✅ Passed	The pull request modifies only the code evidence search script with performance optimizations and no git operations or safety violations.
No Untrusted Mcp Servers	✅ Passed	The pull request modifies only a Python utility script for code evidence retrieval with performance optimizations and no MCP server installations or untrusted dependencies.
Skill And Script Conventions	✅ Passed	The file `find_evidence.py` adheres to all Skill and Script Conventions requirements. It uses no `plugin:` prefixed skill references or old slash-command syntax. Script invocation patterns shown in the docstring correctly use relative paths for co-located script calls. The imports are proper Python library imports from the code-finder package rather than script invocations, and no cross-skill script calls that would require `${CLAUDE_PLUGIN_ROOT}` are present in the modified code.
Plugin Registry Consistency	✅ Passed	PR modifies only Python implementation code, not plugin.json, marketplace.json, or plugin documentation files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 60 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py`:
- Around line 133-134: The import ordering in find_evidence.py is unsorted and
causing lint failure: move the import of
claude_context.skills._index_manager.ensure_index so it appears before
claude_context.skills.evidence_retrieval.retrieve_evidence (i.e., ensure_index
import should precede retrieve_evidence), then re-run the project's import
sorter/formatter (or manually reorder the two import lines) to satisfy I001.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ad13bc82-68d5-4b48-823a-975e819d430b

📥 Commits

Reviewing files that changed from the base of the PR and between 4651e72 and 3e58c2f.

📒 Files selected for processing (1)

plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py`:
- Line 44: The function _format_result declares an unused parameter
filter_paths; remove filter_paths from the function signature and update any
call sites that pass filter_paths (the invocation that builds the outer wrapper
dict) to stop supplying it—ensure the wrapper still adds filter_paths separately
as before; only change _format_result's signature and caller arguments, leaving
the body and other returned fields untouched.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 054acbb3-523e-41a8-bd8a-45faf40acd8f

📥 Commits

Reviewing files that changed from the base of the PR and between 3e58c2f and 5e18ff3.

📒 Files selected for processing (1)

plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py

coderabbitai · 2026-05-04T14:58:26Z

+    return [str((repo_root / p).resolve()) for p in filter_paths]
+
+
+def _format_result(query, filter_paths, repo_path, index_info, results):


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove unused filter_paths parameter.

The filter_paths parameter is declared but never referenced in the function body. At line 172, filter_paths is added to the outer wrapper dict separately, not via this function.

Proposed fix

-def _format_result(query, filter_paths, repo_path, index_info, results): +def _format_result(query, repo_path, index_info, results):

And update the call site at line 171:

- result = _format_result(query, filter_paths, repo_path, index_info, raw) + result = _format_result(query, repo_path, index_info, raw)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py` at line 44, The function _format_result declares an unused parameter filter_paths; remove filter_paths from the function signature and update any call sites that pass filter_paths (the invocation that builds the outer wrapper dict) to stop supplying it—ensure the wrapper still adds filter_paths separately as before; only change _format_result's signature and caller arguments, leaving the body and other returned fields untouched.

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

Comment thread plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py Outdated

Update plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py

5e18ff3

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reuse searcher across batch queries in find_evidence.py#145

perf: reuse searcher across batch queries in find_evidence.py#145
tarilabs wants to merge 2 commits intoredhat-documentation:mainfrom
tarilabs:tarilabs-20260504-reuseindex

tarilabs commented May 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 4, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		return [str((repo_root / p).resolve()) for p in filter_paths]


		def _format_result(query, filter_paths, repo_path, index_info, results):

Conversation

tarilabs commented May 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tarilabs commented May 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading